Machine learning models for early prediction of potassium lowering effectiveness and adverse events in patients with hyperkalemia

The aim of this study was to develop a model for early prediction of adverse events and treatment effectiveness in patients with hyperkalemia. We collected clinical data from patients with hyperkalemia in the First Hospital of Zhejiang University School of Medicine between 2015 and 2021. The least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression were used to analyze the predictors on the full dataset. We randomly divided the data into a training group and a validation group, and used LASSO to filter variables in the training set. Six machine learning methods were used to develop the models. The best model was selected based on the area under the curve (AUC). Shapley additive exPlanations (SHAP) values were used to explain the best model. A total of 1074 patients with hyperkalemia were finally enrolled. Diastolic blood pressure (DBP), breathing, oxygen saturation (SPO2), Glasgow coma score (GCS), liver disease, oliguria, blood sodium, international standardized ratio (ISR), and initial blood potassium were the predictors of the occurrence of adverse events; peripheral edema, estimated glomerular filtration rate (eGFR), blood sodium, actual base residual, and initial blood potassium were the predictors of therapeutic effect. Extreme gradient boosting (XGBoost) model achieved the best performance (adverse events: AUC = 0.87; therapeutic effect: AUC = 0.75). A model based on clinical characteristics was developed and validated with good performance.

at the expense of long-term cardiac and renal benefits, which is one of the major barriers to controlling disease progression.The estimated risk ratios for increased mortality in hyperkalemia patients compared to non-hyperkalemia patients were reported to be 1.1 16 to 17.7 17 .It has been shown that serum potassium levels are associated with a U-shaped risk of death, and that low potassium levels also increase the risk of death, but the higher the level of potassium, the higher the risk of death 18 .
A study based on a large U.S. Medicare and commercial claims database containing 1.7 million medical records between 2007 and 2012 showed that the prevalence of hyperkalemia was 34.6% among patients with chronic kidney disease and 30% among patients with heart failure.Hyperkalemia has been a hot topic of clinical research and is being actively explored both in terms of diagnosis and treatment.Hyperkalemia occurs with adverse events associated with higher plasma [K + ] values 19,20 .There are no studies on the short-term prognosis and treatment outcomes of patients admitted to hospitals with hyperkalemia.Because of the significant increases in hospitalization rate and subsequent in-hospital mortality in hyperkalemia, reliable predictors of adverse clinical outcomes and treatment outcomes have not been established, and it is important to understand the factors associated with treatment efficacy and adverse events in a timely manner 8 .Therefore, we designed this study using a machine learning (ML) algorithm to analyze the clinical data of patients admitted with hyperkalemia in order to develop a model to predict the adverse events and treatment efficacy in patients admitted with hyperkalemia and to screen patients for priority attention.

Data source
Data for this study were obtained from clinical data of patients admitted with hyperkalemia to the emergency department of the First Hospital of Zhejiang University School of Medicine (Hangzhou, China) from January 2015 to December 2021.We collected the detailed basic information, vital sign measurements, diagnostic information, laboratory information, and treatment information.The study was approved by the Clinical Research Ethics Committee of the First Affiliated Hospital, Zhejiang University School of Medicine.Because this study was a retrospective design, written informed consent was waived with the approval of the Clinical Research Ethics Committee of the First Affiliated Hospital, Zhejiang University School of Medicine.This study was performed in line with the principles of the Declaration of Helsinki.Approval was granted by the Clinical Research Ethics Committee of the First Affiliated Hospital, Zhejiang University School of Medicine (No. 2022971).

Participants
Inclusion criteria: admission diagnosis of hyperkalemia (serum potassium > 5.5 mmol/L); age ≥ 18 years.Exclusion criteria: age < 18 years; pregnant women; laboratory tests suggesting serum potassium < 5.5 mmol /L; incomplete clinical data or missing data on blood potassium; patients with blood potassium > 10.0 mmol/L on any occasion.A total of 1074 patients with hyperkalemia were finally included.

Research variables
We collected data based on the association of the variables with the outcomes, and then eliminated variables with a missing rate > 28%, and finally selected 52 candidate variables.These variables were recorded for the first time after admission.They included demographic variables, comorbidities, vital signs, laboratory findings, oliguria and Glasgow coma score.Demographic variables included age, gender, smoking, and alcohol consumption.Co-morbidities include hypertension, peripheral edema, diabetes mellitus, heart failure, chronic liver disease, tumors, chronic kidney disease, history of hyperkalemia, diabetic nephropathy, and acute gastrointestinal bleeding.Vital signs included heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, respiratory rate, body temperature, oxygen saturation (SpO 2 ), and fraction of inspiration oxygen (FiO 2 ).Among the laboratory results, we selected the following variables: white blood cell count, red blood cell count, hemoglobin, red blood cell pressure, platelet count, glutamic aminotransferase, serum creatinine (SCr), estimated glomerular filtration rate (eGFR), urea, uric acid, Initial and last blood potassium, sodium, chloride, total calcium, inorganic phosphorus, International standardized ratio(ISR), fibrinogen (Fib), activated partial thromboplastin time (APTT), prothrombin time (PT), thrombin time (TT), pH, partial pressure of carbon dioxide (pCO 2 ), partial pressure of oxygen (pO 2 ), bicarbonate concentration, base excess (BE), lactate dehydrogenase (LDH), hydroxybutyrate dehydrogenase (HBDH), and creatine kinase-MB (CKMB).Treatment was considered effective if blood K + ≤ 5.5 mmol/L checked after the last treatment.Effective and ineffective treatment groups were divided according to the last blood potassium of hospitalization.Admission adverse events included: admission to ICU, death, respiratory and cardiac arrest.

Study design
Influential factors were analyzed on the full dataset using the least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression.In this retrospective cohort study, model development, validation, interpretation, and application were performed sequentially.We divided the overall random data into two parts, where 70% is used as training data and 30% as validation data.Use LASSO to filter variables in the training set.For the prediction of adverse events, the data in the training set were balanced using the SMOTE algorithm.Six ML models-XGBoost, logistic regression (LR), random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and decision tree (DT)-were used to build predictive models.In order to improve the fairness and reliability of the comparison between models, this study used tenfold cross-validation to initially assess the performance.The data was normalized using the MinMaxScaler function in sklearn.preprocessingmodule before applying KNN, SVM, logistic.First, we found the optimal parameters for each of the six machine learning methods by grid search and fivefold cross-validation in the training set, and then validated them in the test

Risk characterization factors for treatment effects and adverse events
The LASSO compresses variable coefficients to prevent overfitting and to address severe covariances.LASSO regression analysis was performed on the full dataset to screen variables (Supplementary Fig. 1; Supplementary Fig. 2).The results showed that 13 variables were screened for adverse events: DBP, breathing, SpO 2 , GCS, liver disease, oliguria, urea, uric acid, sodium, ISR, PH, BE, and initial blood potassium.The treatment effects were screened for 7 variables: peripheral edema, oliguria, eGFR, urea, sodium, BE, and initial blood potassium.The correlations between the variables were all low (Supplementary Fig. 3).To further control for the effects of confounding factors, multivariate logistic regression analysis was performed.Finally, only peripheral edema, eGFR, sodium, base excess, and Initial blood potassium were identified as influences on treatment effect (Table 2).Only DBP, breathing, SpO 2 , GCS, liver disease, oliguria, sodium, ISR, and initial potassium were identified as factors affecting adverse events (Supplementary Table 4).
LASSO regression analysis was performed on the training set to screen the variables.(Supplementary Fig. 4; Supplementary Fig. 5).The results showed that 15 variables were screened for adverse events: DBP, breathing, SpO 2 , GCS, liver disease, oliguria, Fib, uric acid, sodium, ISR, PH, BE, glutathione transaminase, FiO 2 , and initial blood potassium.The treatment effects were screened for 7 variables: peripheral edema, oliguria, eGFR, urea, sodium, BE, Hemoglobin, and initial blood potassium.The correlations between the variables were all low (Supplementary Fig. 6).

Model building and evaluation
Tenfold cross-validation was performed for performance evaluation.Using AUC values as the evaluation metrics and plotting box plots to initially see the distribution of predictive performance, the results showed the best performance of the XGBoost model (Supplementary Fig. 7).The LASSO-screened variables were used to build models in the training set and to predict in the test set.In Table 3 and Supplementary Table 5, we summarized the performance of the six models in terms of AUC, accuracy, sensitivity, specificity and F1 scores.In the prediction of adverse events, compared with other ML models (AUC: RF 0.779, LR 0.844, SVM 0.848, KNN 0.685, DT 0.699; accuracy: RF 0.780, LR 0.796, SVM 0.827, KNN 0.814, DT 0.770, XGBoost 0.848) (Supplementary Table 5), the XGBoost model had the best model fit performance with an AUC of 0.870 (a of Fig. 1) and a sensitivity of 0.643 in the validation cohort.The DCA measures net benefit at different threshold probabilities.The black line in Fig. 2 indicates that all patients were assumed to receive the intervention, while the dashed line indicates that all patients did not receive the intervention.The threshold probabilities range from 0.016 to 0.258, and the XGBoost model outperforms the other models in terms of net benefit (a of Fig. 2).
In predicting treatment effects, compared with other ML models (AUC: RF 0.702, LR 0.703, SVM 0.693, KNN 0.605, and DT 0.683; accuracy: RF 0.641, LR 0.666, SVM 0.613, KNN 0.619, and DT 0.694), in the validation cohort, the XGBoost model has the best model fit performance with an AUC of 0.750 and an accuracy of 0.712 (Table 3).However, the SVM model has the highest sensitivity (0.661).For threshold probabilities from 0.183 to 0.435 (or 0.488-0.685),the XGBoost model outperforms the other models in net benefit (b of Fig. 2).After considering several performance metrics together, we chose XGBoost to construct the final model.Random seeds were removed and the mean AUC and standard deviation were obtained by running each algorithm 1200 times (Supplementary Table 6; Supplementary Table 7).It could be seen that the final choice of XGBoost model was also stable.www.nature.com/scientificreports/

Model explanation
The SHAP algorithm was used to derive the importance of each predictive feature on the prediction results of the XGBoost model.The feature importance plot lists the relatively significant features by descending order (Supplementary Fig. 8).SHAP feature density scatter plots showed that SpO 2 had the strongest predictive value, followed by sodium, BE and initial potassium.In addition, to detect positive and negative relationships between features and target outcomes, SHAP values were applied to reveal risk factors for the occurrence of adverse events in hyperkalemia (a of Fig. 3). Figure 3 showed the distribution of all individuals on each variable, where the horizontal coordinates measured the size of the variable as it got larger to the right.The effects of features on the XGBoost model (positive or negative) are shown in Fig. 3 21 , red represents an increased risk of death and blue represents a decreased risk of death.SHAP feature density scatter plots.It can be seen that the presence of uric  acid had a positive effect and drives the prediction of an adverse event, while an increase in SPO2 had a negative effect and drives the prediction of no adverse event.SHAP feature density scatter plots showed that initial potassium had the strongest predictive value, followed by eGFR, sodium and hemoglobin (b of Fig. 3).It can be seen that the presence of initial potassium had a positive effect and drives the prediction of an unsuccessful treatment, while an increase in eGFR had a negative effect and drives the prediction of successful treatment.

Model application
Figure 4 show the individual force maps for randomly selected patients without adverse events (a) and unsuccessful treatment (b).This patient's SHAP value indicates the predictive variables of relevance for the individual patient and the contribution of each variable to the prediction of target event.The number on the X-axis is the SHAP value, and the values for each feature of the sample are shown below the horizontal line.Red features indicate an increased risk of target event, and blue features indicate a decreased risk of target event 22 .The length of the arrow is proportional to the SHAP value, the longer the arrow, the greater the prediction effect 23 .The contribution of some variables is too low to be shown in the figure, and only the more contributing variables are shown in Fig. 4. SHAP value for target event in patient A was − 1.99, and the actual patient survived after admission.SHAP value for target event in patient B was 0.43, and the actual patient's last potassium was greater than 5.5.

Discussion
Previously, no studies were predicting adverse events and therapeutic effect in patients with hyperkalemia admitted to the hospital, so this study is open source.The aim of this study was to predict the final potassium-lowering effect in hyperkalemic patients at an early stage, i.e., to screen out patients who are not prone to successful potassium-lowering prior to treatment, and subsequently to focus attention and treatment.Therefore, therapeutic drugs and duration of treatment were not included in this study for analysis.In this retrospective cohort study in emergency medicine, we developed and validated six ML algorithms.The XGBoost model outperformed LR, RF, SVM, KNN, and DT.The interpretation of the XGBoost model using the SHAP method ensured the clinical interpretability of the model, which allowed physicians to better understand the decision process of the model and facilitated the use of the model.In this study, we found that the developed model performs best when the DCA correlation threshold probability is in a certain range.XGBoost has been widely used to predict in-hospital mortality in patients in numerous studies.However, the rate of adverse events in the final cohort of patients with hyperkalemia was only 8.66%.The ROC curve indicated that the XGBoost model was the best, but the accuracy of the adverse event class prediction was 0.643 (sensitivity).Therefore, the XGBoost model might not fully provide decision support for clinicians.In clinical practice, it is necessary to evaluate the benefits of early prediction of adverse events and their additional costs.
The SHAP method shows the contribution of all variables to the model output not only at the macro level through the feature density scatter plots, feature importance SHAP values, but also at the micro level through the individual sample variable impact plots 24 .By using SHAP to interpret the XGBoost model, we identified a number of important variables associated with adverse events and therapeutic effect of hyperkalemia with hospital admission.SHAP specifies whether the effects of variables are positive or negative 25 .In the present study, SpO 2 and initial potassium were the most important predictive feature.A more serious consequence of hyperkalemia is a decrease in myocardial resting membrane potential, which leads to a slowing of myocardial cell conduction velocity and an increase in repolarization rate 3 .Hyperkalemia has a significant effect on both conduction and automaticity of cardiomyocytes 26 , Both high potassium and low sodium affect the electrophysiological activity of cardiomyocytes, and potassium is necessary for normal cardiomyocyte function including impulse conduction and coordinated myocardial contraction 9,27 .Thus, disturbances in potassium levels predispose to arrhythmias, and the mechanism by which high potassium causes death in patients may be the induction of fatal arrhythmias 28 .Reduced tubular flow due to sodium restriction may lead to hyperkalemia 29 , and disturbances in both serum sodium and potassium are independently associated with poor prognosis 30 .The kidneys are the most important excretory site 31 , and when kidney function is abnormal and excretory function is impaired, it will lead to an increase in uric acid concentration.The decline in glomerular filtration capacity is a direct reflection of progressive kidney damage.Elevated levels of urea indicate an increased risk of AKI 32 .Under normal conditions, the kidneys excrete 90% of the daily potassium intake 12,33,34 , and abnormal kidney function is the most common cause of hyperkalemia 18,35,36 , so for patients with hyperkalemia, we should focus on the patient's kidney function 37 .In addition, patients with peripheral edema or oliguria were also prone to bad outcomes, and these symptoms might also suggest renal dysfunction 38 .Metabolic acidosis occurs in all patients prior to cardiac arrest, which can result in extracellular potassium transfer, when the base excess is too low 5,33,39 .It has been suggested that hyperkalemia can cause renal tubular acidosis and lead to peripheral neuropathy in patients with chronic kidney disease 34 .It has been noted that metabolic acidosis and AKI are independent predictors of mortality in patients hospitalized with hyperkalemia 40 .The GCS score is a level of consciousness score that can clearly indicate deterioration in neurological function; a lower score indicates a worse condition, and the GCS score is often associated with the risk of death.Hyperkalemia increases the risk of adverse events associated with arrhythmias, which can lead to hypotension and myocardial ischemia 39 .In addition, higher ISR indicates a worse prognosis for the patient, which is consistent with the actual clinical significance.
There are some limitations in this study.First, variables with missing values and high rates of missingness were removed from this study, and potentially more important characteristics were not selected for inclusion.Second, all data were from China, and there were many unmeasured confounders, such as race and treatment strategy.Third, our study lacked external validation of independent cohorts from other regions or countries, and the applicability of the developed XGBoost model in clinical practice needs further validation.Fourth, a singlecenter retrospective study limited our ability to identify causal relationships between variables and outcomes.Therefore, further prospective randomized controlled trials are needed to validate the validity of our model.Finally, only adults were recruited in our study, and the predictive effect of the XGBoost model on the prognosis and therapeutic effect of children with hyperkalemia is unclear.This finding needs to be interpreted with caution, and more evidence is needed to confirm it in the near future.

Conclusions
For hyperkalemia, we developed the interpretable XGBoost prediction model that performed best in predicting the risk of adverse events and therapeutic effect.In addition, applying interpretable ML can accurately identify risk factors and enhance physician confidence in the prediction model.This will help physicians to identify hyperkalemia patients with a high risk of death so that appropriate treatment measures can be taken promptly.

Figure 1 .
Figure 1.Evaluation of the six machine learning algorithms based on the AUC of the ROC curve.AUC, area under the curve; ROC, receiver operating characteristic; SVM, support vector machine; XGBoost, extreme gradient boosting; (a) adverse events; (b) therapeutic effect.

Figure 3 .
Figure 3. SHAP summary plot of the features of the XGBoost model.The higher the SHAP value of a feature, the higher the probability of an adverse event.A point is created for each feature attributed value of the model for each patient, so that one point is assigned to a patient on a straight line for each feature.The dots are colored according to the patient's feature values and accumulated vertically to describe the density.Red represents higher feature values; blue represents lower feature values.SHAP, SHapley Additive exPlanations; (a) adverse events; (b) therapeutic effect; DBP, diastolic blood pressure; SpO 2 , oxygen saturation; GCS, Glasgow coma score; ISR, international standardized ratio; eGFR, estimated glomerular filtration rate; BE, base excess; FiO 2, fraction of inspiration oxygen; Fib, fibrinogen; PT, prothrombin time.

Figure 4 .
Figure 4. Force plot of model prediction results explained with a random sample.eGFR, estimated glomerular filtration rate; BE, base excess; SpO 2 , oxygen saturation; GCS, Glasgow coma score; PT, prothrombin time; (a) adverse events; (b) therapeutic effect;

Table 1 .
Demographic and clinical characteristics at baseline.SBP systolic blood pressure, DBP diastolic blood pressure, MAP mean arterial pressure, SpO 2 oxygen saturation, FiO 2 fraction of inspiration oxygen, GCS Glasgow coma score, HTN hypertension, DM diabetes mellitus, eGFR estimated glomerular filtration rate, ISR international standardized ratio, Fib fibrinogen, APTT activated partial thromboplastin time, PT prothrombin time, TT thrombin time, pCO 2 partial pressure of carbon dioxide, pO 2 partial pressure of oxygen, LDH lactate dehydrogenase, HBDH hydroxybutyrate dehydrogenase, CK-MB creatine kinase-MB, 1 yes.

Table 3 .
Table2.Multivariate logistic regression analysis of therapeutic effect.R regression coefficient, SE standard error, OR odds ratio, CI confidence interval, eGFR estimated glomerular filtration rate, 0 no, 1 yes.Performances of the six machine learning models for predicting therapeutic effect.AUC area under the curve, XGBoost extreme gradient boosting, LR logistic regression, KNN k-nearest neighbor, DT decision tree, SVM support vector machine, RF random forest.